Developing Corpus of Lecture Utterances Aligned to Slide Components
نویسندگان
چکیده
The approach which formulates the automatic text summarization as a maximum coverage problem with knapsack constraint over a set of textual units and a set of weighted conceptual units is promising. However, it is quite important and difficult to determine the appropriate granularity of conceptual units for this formulation. In order to resolve this problem, we are examining to use components of presentation slides as conceptual units to generate a summary of lecture utterances, instead of other possible conceptual units like base noun phrases or important nouns. This paper explains our developing corpus designed to evaluate our proposing approach, which consists of presentation slides and lecture utterances aligned to presentation slide components.
منابع مشابه
Automatic Alignment Between Classroom Lecture Utterances and Slide Components
Multimodal alignment between classroom lecture utterances and lecture slide components is one of the crucial problems to realize a multimodal e-Learning application. This paper proposes the new method for the automatic alignment, and formulates the alignment as the integer linear programming (ILP) problem to maximize the score function which consists of three factors: the similarity score betwe...
متن کاملA Korean Spoken Document Retrieval System for Lecture Search
In this paper, we introduced a Korean spoken document retrieval system for lecture search. We automatically build a general inverted index table from spoken document transcriptions, and we extract additional information from textbooks or slide notes related to the lecture. We integrate these two sources for a search process. The speech corpus used in our system is from a highschool mathematics ...
متن کاملThe Negochat Corpus of Human-agent Negotiation Dialogues
Annotated in-domain corpora are crucial to the successful development of dialogue systems of automated agents, and in particular for developing natural language understanding (NLU) components of such systems. Unfortunately, such important resources are scarce. In this work, we introduce an annotated natural language human-agent dialogue corpus in the negotiation domain. The corpus was collected...
متن کاملBeyond next Slide, Please": the Use of Content and Speech in Multi-modal Control
The Intelligent Classroom is an automated lecture facility where one of the primary goals is that speakers be able to control it by interacting with it as they would with a human A/V technician. In this paper we describe our research in imbedding Microsoft Powerpoint into the Intelligent Classroom. In particular we discuss how we use two modes of sensing (Computer Vision and Speech Recognition)...
متن کاملSupervised Spoken Document Summarization jointly Considering Utterance Importance and Redundancy by Structured Support Vector Machine
In extractive spoken document summarization, it is desired to select important utterances from documents to construct the summary while avoiding redundancy among the selected utterances, but it is not easy to balance the two different goals. In this paper, a supervised spoken document summarization approach is proposed based on structured support vector machine (SVM), in which the above two goa...
متن کامل